An efficient inverted index technique for XML documents using RDBMS

نویسندگان

  • Chiyoung Seo
  • Sang-Won Lee
  • Hyoung-Joo Kim
چکیده

The inverted index is widely used in the existing information retrieval field. In order to support containment queries for structured documents such as XML, it needs to be extended. Previous work suggested an extension in storing the inverted index for XML documents and processing containment queries, and compared two implementation options: using an RDBMS and using an Information Retrieval (IR) engine. However, the previous work has two drawbacks in extending the inverted index. One is that the RDBMS implementation is generally much worse in the performance than the IR engine implementation. The other is that when a containment query is processed in an RDBMS, the number of join operations increases in proportion to the number of containment relationships in the query and a join operation always occurs between large relations. In order to solve these problems, we propose in this paper a novel approach to extend the inverted index for containment query processing, and show its effectiveness through experimental results. In particular, our performance study shows that (1) our RDBMS approach almost always outperforms the previous RDBMS and IR approaches, (2) our RDBMS approach is not far behind our IR approach with respect to performance, and (3) our approach is scalable to the number of containment relationships in queries. Therefore, our results suggest that, without having to make any modifications on the RDBMS engine, a native implementation using an RDBMS can support containment queries as efficiently as an IR implementation. q 2002 Elsevier Science B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

X-Binder: Path Combining System of XML Documents Based on RDBMS

With the increasing use of XML, considerable research is being conducted on the XML document management systems for more efficient storage and searching of XML documents. Depending on the base systems, these researches can be classified into object-oriented DBMS (OODBMS) and relational DBMS (RDBMS). OODBMS-based systems are better suited to reflect the structure of XML-documents than RDBMS-base...

متن کامل

Efficient indexing technique for XML-based electronic product catalogs

Electronic product catalogs are considered as one of the main components of e-commerce applications. Efficient processing of queries on product catalogs is important for customer satisfaction. In this paper, we present an indexing structure for processing queries efficiently on natively stored XML-based electronic product catalogs. We also present the performance comparison of our index structu...

متن کامل

Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques

We propose XIR-Linear, a method for efficiently evaluating linear path expressions (LPEs) on large-scale heterogeneous XML documents using information retrieval (IR) techniques. LPEs are the primary form of XPath queries, and their evaluation techniques have been researched actively. XPath queries in their general form are partial match queries, and these queries are particularly useful for sea...

متن کامل

XParent: An Efficient RDBMS-Based XML Database System

The Extensible Markup Language (XML) is an emerging standard for data representation and exchange on the Internet. In order to facilitate the task of querying XML documents, efficient storage models for storing XML documents in database systems were studied. There are basically three alternatives: storing XML data in repositories designed for semistructured data [7, 9], in object-oriented datab...

متن کامل

A New Query Engine using Novel Three Dimensional Index for Xml Documents

XML has gained prominence as data storage and exchange format for web applications. This is because there are certain features which are unique to XML like self descriptivism, extensibility and non proprietary text document storage. In spite of all these unique features XML has an inherent limitation of verbosity. This size problem of XML should be dealt with efficiently so that a good compress...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Information & Software Technology

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2003